Network engineers wake up to computational storage


A well-known technology news and analysis site (not this one) used to run the Australian version of its data storage pages under a title banner that read SNORAGE: all things data & storage. Or words to that effect, after the amusing title.

But while storage is snorage for some, data has obviously become the lifeblood of everything. This means that the methods through which data is filed away, accessed and processed have become enough of an issue to wake even sleepier storage souls up. Of all the actions that stored data undergoes, from Extract, Transform & Load (ETL) onwards to other functions such as deduplication, parsing and analytics, it is the live processing of data that is driving arguably the biggest growth subsector in this field.

Computational storage exists to process data at the storage plane (on a storage device) rather than it having to move backwards and forwards to the compute plane (near the CPU) to gain access to microprocessor power and other application code that might affect its journey through life. If that has woken up a few storage somnambulists, all well and good. But before we define the parameters, performance peaks and procedural peculiarities of computational storage, let’s be very clear to define why computational data storage exists and what key advantage it gives us.

As we have said, computational storage means that processing can happen on the storage device, so it’s well suited to real-time applications because there’s little or no internal system transport delay. What this means at the higher level is that storage infrastructure and compute infrastructure are becoming one and the same thing.

But it’s important that we don’t depreciate or denigrate the innovations that have happened in storage over the last quarter-century. We should not (as some do in this space) say that conventional storage architectures have altogether failed to keep pace with modern application data storage needs. At its core, it’s really there to help us get around bottlenecks and just compute smarter.

A well-known technology news and analysis site (not this one) used to run the Australian version of its data storage pages under a title banner that read SNORAGE: all things data & storage. Or words to that effect, after the amusing title.

But while storage is snorage for some, data has obviously become the lifeblood of everything. This means that the methods through which data is filed away, accessed and processed have become enough of an issue to wake even sleepier storage souls up. Of all the actions that stored data undergoes, from Extract, Transform & Load (ETL) onwards to other functions such as deduplication, parsing and analytics, it is the live processing of data that is driving arguably the biggest growth subsector in this field.

Computational storage exists to process data at the storage plane (on a storage device) rather than it having to move backwards and forwards to the compute plane (near the CPU) to gain access to microprocessor power and other application code that might affect its journey through life. If that has woken up a few storage somnambulists, all well and good. But before we define the parameters, performance peaks and procedural peculiarities of computational storage, let’s be very clear to define why computational data storage exists and what key advantage it gives us.

As we have said, computational storage means that processing can happen on the storage device, so it’s well suited to real-time applications because there’s little or no internal system transport delay. What this means at the higher level is that storage infrastructure and compute infrastructure are becoming one and the same thing.

But it’s important that we don’t depreciate or denigrate the innovations that have happened in storage over the last quarter-century. We should not (as some do in this space) say that conventional storage architectures have altogether failed to keep pace with modern application data storage needs. At its core, it’s really there to help us get around bottlenecks and just compute smarter.

Bandwidth bottlenecks & Brazilian chainsaws

Not all transport journeys that our data takes from the storage layer to the CPU, to a big data analytics engine, to an Artificial Intelligence engine, onward to the dashboard visualization layer and back again to storage are as energy-efficient as they could be. That’s energy as in IT system energy and capacity and energy as in national grid energy and capacity. It’s a long and complex algorithm, but the more bandwidth we save at the network level, the fewer trees we have to chop down somewhere inside a Brazilian rainforest.

We are not shifting ‘all’ our data from traditional storage to computational storage; we are still saying ‘some’ or in fact ‘plenty’ of data should make the normal journey between compute and storage. The difference here is that we’re able to distribute workloads with a higher level of holistic control to avoid application bottlenecks. With more and more computing happening out on the ‘edge’ inside our Internet of Things (IoT) devices, it’s clearly a good time for computational storage to develop that can be used in all types of live software scenarios.

Knowing which data to expose to which methods is another story, but some of the answer will be determined by just how structured, semi-structured or unstructured the data itself is – plus of course the type of application or data service it exists in.

If all this sounds like a process of reengineering that could cause disruption and confusion, then perhaps it’s good to know that storage enthusiasts are a fairly stoic and determined bunch, so the Storage Networking Industry Association (SNIA) was established and incorporated in 1997. If nobody in your team thinks they have the patience and attention to detail needed to worry about the taxonomy, the development roadmap and the standardization of interfaces, protocols and features in computational storage, then that’s okay, because the SNIA boys and girls live for this stuff.

Computational storage evangelists argue that there is a security advantage in having processors actually located on the storage drive controller to perform a processing operation. It’s a simple enough concept, the data hasn’t actually moved off of its host device, so it’s less exposed to the vulnerabilities of the outside world. The remote host processor (back at the system motherlode) is also freed up to perform other tasks, some of which could be security-related, so a possible virtuous circle is achieved.

Computational Storage Drives (CSDs) power up

While not all members of the SNIA actually manufacture Computational Storage Drives (CSDs), they all have enough vested interest in this development space to make sure they’re actively involved. Among the more prevalent CSD manufacturers currently are Samsung, Eideticom, Nyriad, ScaleFlux and NGD.

Computational Storage Drives (CSDs) themselves are not Hard Disk Drives (HDDs) with a spinning disk and mechanical arm (known as an actuator) to magnetically perform read and write operations. Instead, CSDs are built using Solid State Disk (SSD) technology constructed by various densities of interconnected flash memory chips made out of silicon, where read and write operations happen without any moving parts. Some CSDs run a distribution of Linux to add the compute function, while others use Field Programmable Gate Arrays (FPGA), but let’s leave the integrated circuit nomenclature there and say that some CSDs are fixed, while others are defined to be programmable.

What matters most at the comparatively early stage for this still-nascent technology is how it is being prototyped and implemented in the real world and what the data-developer community does to proliferate and extend it. As smaller data-estate working groups start to create new functions for esoteric use cases with computational storage, the technology’s proof of concept parameters will be more clearly defined.

In the immediate future, we can expect to see computational storage working in IoT edge environments as we have said. After all, windfarm turbines have a tough and lonely job to do, so the more on-board intelligence we can kit them out with the better. Commercial aircraft could also be a key development area for computational storage. If your next transatlantic flight has an engine fuel system that not only collects terabytes of data during the journey, but also works to process some of it for safety checks, you might ride the turbulence a little more comfortably.

In the real (but still essentially virtualized) world, we may see most computational storage deployed as a new substrate for cloud across the planet’s datacenters. Giving our storage backbone the power to ‘do more’ before it interacts with us has to be a key enabler for hyperscale hyper-automated hyper-computing on the road to our quantum future.

If you’re still storage snoring, consider a decongestant, try sleeping on your belly more, or just put two old hard drives under your pillow.



Source link